A Profiling Method for Analyzing Scalability Bottlenecks on Multicores

نویسندگان

  • David Eklov
  • Nikos Nikoleris
  • Erik Hagersten
چکیده

A key goodness metric of multi-threaded programs is how their execution times scale when increasing the number of threads. However, there are several bottlenecks that can limit the scalability of a multi-threaded program, e.g., contention for shared cache capacity and off-chip memory bandwidth; and synchronization overheads. In order to improve the scalability of a multi-threaded program, it is vital to be able to quantify how the program is impacted by these scalability bottlenecks. We present a software-only profiling method for obtaining speedup stacks. A speedup stack reports how much each scalability bottleneck limits the scalability of a multi-threaded program. It thereby quantifies how much its scalability can be improved by eliminating a given bottleneck. A software developer can use this information to determine what optimizations are most likely to improve scalability, while a computer architect can use it to analyze the resource demands of emerging workloads. The proposed method profiles the program on real commodity multi-cores (i.e., no simulations required) using existing performance counters. Consequently, the obtained speedup stacks accurately account for all idiosyncrasies of the machine on which the program is profiled. While the main contribution of this paper is the profiling method to obtain speedup stacks, we present several examples of how speedup stacks can be used to analyze the resource requirements of multi-threaded programs. Furthermore, we discuss how their scalability can be improved by both software developers and computer architects.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward Scalable Transaction Processing

Designing scalable transaction processing systems on modern multicore hardware has been a challenge for almost a decade. The typical characteristics of transaction processing workloads lead to a high degree of unbounded communication on multicores for conventional system designs. In this tutorial, we initially present a systematic way of eliminating scalability bottlenecks of a transaction proc...

متن کامل

Non-Uniform HEVC Tile Partitioning Method for Asymmetric Multicores

This paper proposes a novel high efficiency video coding (HEVC) Tile partitioning method for the parallel processing by analyzing the computing ability of asymmetric multicores. The proposed method (i) analyzes the computing ability of asymmetric multicores and (ii) makes the regression model of computational complexity per video resolutions. Finally, the model (iii) determines the optimal HEVC...

متن کامل

A Methodology for Accurate, Effective and Scalable Performance Analysis of Application Programs

We describe a unique and comprehensive methodology for accurately measuring and effectively analyzing the performance of an application’s execution. This methodology is 1) accurate, because it assiduously avoids systematic measurement error (such as that introduced by instrumentation); 2) effective, because it associates useful performance metrics (such as memory bandwidth) with important sourc...

متن کامل

Characterizing the Performance and Scalability of Many-core Applications on Virtualized Platforms

Clouds have become attractive to applications, because of its low cost and on-demand computing model with the use of virtualization technologies. With the continual increasing number of cores per chip, it should be an emergence to study and improve the scalability of virtualized platforms. This paper tries to make a study on the horizontal scalability 1 of a set of parallel applications on virt...

متن کامل

Profiling Distributed File Systems with Computer Animation

Achieving performance, reliability, and scalability has proven difficult for distributed file systems. Placement of data, load distribution and other overheads are often the culprits. Profiling is a useful technique for understanding file system behavior, improving performance and debugging problems. Existing file system profiling methods often examine fine-grained system activity, such as the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012